{ "cells": [ { "cell_type": "markdown", "id": "XUdcqLkabYny", "metadata": { "id": "XUdcqLkabYny" }, "source": [ "### **Synthetic Control**\n", "\n", "\n", "### Background\n", "Unlike the Difference-in-Differences (DiD) method, synthetic control is a frequently employed technique when dealing with datasets where there is a significant imbalance between the number of control units and treated units. DiD methods typically demand a high degree of comparability between the treated and control groups to establish the critical \"parallel trend\" assumption. However, this assumption becomes challenging to fulfill when the dataset contains only a limited number, or even just a single, treated unit, often due to issues related to data collection or funding constraints. In this situation, synthetic control aims to reweight the substantial information in control group to provide another perspective to learn the conterfactuals for treated unit(s).\n", "\n", "To illustrate the basic idea of synthetic control, we suppose that there are $N$ units and $T$ time periods in total, and denote $Y_{it}$ as the outcome for unit $i$ in period $t$. Without the loss of generality, suppose the first $N_{\\text{tr}}$ units are in the treated group, which will receive treatment starting from period $T_0+1$. The rest $N_{\\text{co}} := N-N_{\\text{tr}}$ units belong to the control group, which have no possibility to be exposed to treatment at any time.\n", "\n", "\n", "### Algorithm \n", "\n", "There are two main steps in synthetic control methods: \n", "\n", "**Step 1:** Calculate the weights $\\hat{\\omega}_i^{\\text{sdid}}$ that align pre-exposure trends in the outcome of control units for treated units;\n", "\n", "\\begin{equation}\n", " \\hat{Y}_{it} = \\hat{\\omega}_{i0} + \\sum_{j=N_{\\text{co}}+1}^{N}\\hat{\\omega}_{ij} Y_{jt}, \\qquad \\forall i\\in\\{1,\\dots, N_{\\text{tr}}\\}, \\forall t\\in \\{1,\\dots,T\\},\n", "\\end{equation}\n", "where \n", "\\begin{equation}\n", "\\hat{\\omega}_i = \\arg\\min_{\\omega} \\sum_{1\\leq t\\leq T_0} \\bigg(Y_{it} - \\omega_{i0} -\\sum_{j=1}^{N_{\\text{co}}} \\omega_{ij} Y_{jt}\\bigg)^2\n", "\\end{equation}\n", "\n", "\n", "**Step 2:** Use the weights to estimate the post-exposure conterfactuals in causal effect estimation.\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "2f85b822", "metadata": {}, "source": [ "### Demo\n", "In the following part, we use the [Abadie-Diamond-Hainmueller California smoking data](https://www.tandfonline.com/doi/abs/10.1198/jasa.2009.ap08746?casa_token=aNT_5z3JHO0AAAAA:vyxa3Kh7WQsLZ0w5CzcvyiV-YvIJHO8kJOgYkfM14zIipcgSLxEJXN2Fr0BCpJax3xihcqbCt9S1) to illustrate how we can calculate the treatment effect on the treated via synthetic control.\n", "\n", "In this dataset, our goal aims to study the effects of Proposition 99, a large-scale tobacco\n", "control program that California implemented in 1988. Typically, the annual tobacco consumption was evaluated from 1970 to 2000 for a total of $N=39$ states (including California). Therefore, this dataset contains $N = 39$ units with $N_{\\text{co}} = 38$ states in the control group, and only one unit ($N_{\\text{tr}} = 1$, corresponding to the California state) starting from the $19$th period.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "33d37227", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | California | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "29 | \n", "30 | \n", "31 | \n", "32 | \n", "33 | \n", "34 | \n", "35 | \n", "36 | \n", "37 | \n", "38 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1970 | \n", "123.0 | \n", "89.8 | \n", "100.3 | \n", "124.8 | \n", "120.0 | \n", "155.0 | \n", "109.9 | \n", "102.4 | \n", "124.8 | \n", "134.6 | \n", "... | \n", "103.6 | \n", "92.7 | \n", "99.8 | \n", "106.4 | \n", "65.5 | \n", "122.6 | \n", "124.3 | \n", "114.5 | \n", "106.4 | \n", "132.2 | \n", "
1971 | \n", "121.0 | \n", "95.4 | \n", "104.1 | \n", "125.5 | \n", "117.6 | \n", "161.1 | \n", "115.7 | \n", "108.5 | \n", "125.6 | \n", "139.3 | \n", "... | \n", "115.0 | \n", "96.7 | \n", "106.3 | \n", "108.9 | \n", "67.7 | \n", "124.4 | \n", "128.4 | \n", "111.5 | \n", "105.4 | \n", "131.7 | \n", "
1972 | \n", "123.5 | \n", "101.1 | \n", "103.9 | \n", "134.3 | \n", "110.8 | \n", "156.3 | \n", "117.0 | \n", "126.1 | \n", "126.6 | \n", "149.2 | \n", "... | \n", "118.7 | \n", "103.0 | \n", "111.5 | \n", "108.6 | \n", "71.3 | \n", "138.0 | \n", "137.0 | \n", "117.5 | \n", "108.8 | \n", "140.0 | \n", "
1973 | \n", "124.4 | \n", "102.9 | \n", "108.0 | \n", "137.9 | \n", "109.3 | \n", "154.7 | \n", "119.8 | \n", "121.8 | \n", "124.4 | \n", "156.0 | \n", "... | \n", "125.5 | \n", "103.5 | \n", "109.7 | \n", "110.4 | \n", "72.7 | \n", "146.8 | \n", "143.1 | \n", "116.6 | \n", "109.5 | \n", "141.2 | \n", "
1974 | \n", "126.7 | \n", "108.2 | \n", "109.7 | \n", "132.8 | \n", "112.4 | \n", "151.3 | \n", "123.7 | \n", "125.6 | \n", "131.9 | \n", "159.6 | \n", "... | \n", "129.7 | \n", "108.4 | \n", "114.8 | \n", "114.7 | \n", "75.6 | \n", "151.8 | \n", "149.6 | \n", "119.9 | \n", "111.8 | \n", "145.8 | \n", "
5 rows × 39 columns
\n", "